NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Confidence-ranked reconstruction of census microdata from published statistics

https://doi.org/10.1073/pnas.2218605120

Dick, Travis; Dwork, Cynthia; Kearns, Michael; Liu, Terrance; Roth, Aaron; Vietri, Giuseppe; Wu, Zhiwei Steven (February 2023, Proceedings of the National Academy of Sciences)

A reconstruction attack on a private dataset D takes as input some publicly accessible information about the dataset and produces a list of candidate elements of D . We introduce a class of data reconstruction attacks based on randomized methods for nonconvex optimization. We empirically demonstrate that our attacks can not only reconstruct full rows of D from aggregate query statistics Q ( D )∈ℝ m but can do so in a way that reliably ranks reconstructed rows by their odds of appearing in the private data, providing a signature that could be used for prioritizing reconstructed rows for further actions such as identity theft or hate crime. We also design a sequence of baselines for evaluating reconstruction attacks. Our attacks significantly outperform those that are based only on access to a public distribution or population from which the private dataset D was sampled, demonstrating that they are exploiting information in the aggregate statistics Q ( D ) and not simply the overall structure of the distribution. In other words, the queries Q ( D ) are permitting reconstruction of elements of this dataset, not the distribution from which D was drawn. These findings are established both on 2010 US decennial Census data and queries and Census-derived American Community Survey datasets. Taken together, our methods and experiments illustrate the risks in releasing numerically precise aggregate statistics of a large dataset and provide further motivation for the careful application of provably private techniques such as differential privacy.
more » « less
Full Text Available
Improved Regret for Differentially Private Exploration in Linear MDP,

Ngo, Dung Daniel; Vietri, Giuseppe; Wu, Steven (July 2022, Proceedings of the 39th International Conference on Machine Learning)

Full Text Available
Improved Regret for Differentially Private Exploration in Linear MDP

Ngo, Dung Daniel; Vietri, Giuseppe; Wu, Zhiwei Steven (January 2022, Proceedings of the 39th International Conference on Machine Learning)

Full Text Available
Leveraging Public Data for Practical Private Query Release

Liu, Terrance; Vietri, Giuseppe; Steinke, Thomas; Ullman, Jonathan; Wu, Steven (January 2021, International Conference on Machine Learning)
null (Ed.)
Full Text Available
Leveraging Public Data for Practical Private Query Release

Liu, Terrance; Vietri, Giuseppe; Steinke, Thomas; Ullman, Jonathan; Wu, Zhiwei Steven (January 2021, Proceedings of the Thirty-eighth International Conference on Machine Learning)
null (Ed.)
Full Text Available
New Oracle-Efficient Algorithms for Private Synthetic Data Release

Vietri, Giuseppe; Tian, Grace; Bun, Mark; Steinke, Thomas; Wu, Steven (January 2020, Proceedings of the 37th International Conference on Machine Learning)
null (Ed.)
We present three new algorithms for constructing differentially private synthetic data—a sanitized version of a sensitive dataset that approximately preserves the answers to a large collection of statistical queries. All three algorithms are \emph{oracle-efficient} in the sense that they are computationally efficient when given access to an optimization oracle. Such an oracle can be implemented using many existing (non-private) optimization tools such as sophisticated integer program solvers. While the accuracy of the synthetic data is contingent on the oracle’s optimization performance, the algorithms satisfy differential privacy even in the worst case. For all three algorithms, we provide theoretical guarantees for both accuracy and privacy. Through empirical evaluation, we demonstrate that our methods scale well with both the dimensionality of the data and the number of queries. Compared to the state-of-the-art method High-Dimensional Matrix Mechanism (McKenna et al. VLDB 2018), our algorithms provide better accuracy in the large workload and high privacy regime (corresponding to low privacy loss epsilon).
more » « less
Full Text Available

Search for: All records